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IDENTIFIERS 
ABSTRACT 

. Although it is widely known that special assumptions 
are needed for univariate analysis of repeated measures data, 
researchers seldom examine their data for violation of these 
assumptions. This o paper reviews ways "in which repeated measures 
analyses are usually handled and describes limitations of these-' 
methods. A design with two within subject factors (3x3) was tested' 
with a computer simulation of 1,000 such experiments (each with 30 
subjects) to examine the bias of alternate test procedures with data 
similar to that which might reasonably be observed. Two data 
structures were used, with small and large violations of the 
univariate assumptions. Four methods of analysis were compared: 
unadjusted univariate, Geiser-Greenhouse conservative test, epsilon 
correction, and multivariate analysis! The multivariate test was the 
only procedure for which the empirical alpha error rate did not 
differ reliably from the nominal alpha fox any effect tested here. It 
is recommended that multivariate procedures should be used for 
analysis of repeated measures designs when sample size permits. 
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Abstract 

Although it is widely known that special assumptions are needed for ' 

univariate analysis of repeated measures data, researchers .seldom 

examine their data for violation of these assumptions. In this paper, 

we review the ways in which repeated measures analyses are usually"" 

handled and describe limitations of these methods. A design with two 

within subject factors (3x3) was tested with a computer simulation of 

1000 such experiments, each with 30 subjects. Two data structures • 

were used, with small and large violations of the univariate assumptions 

Four methods of analysis were compared: unadjusted univariate, Geiser- 

Greenhouse conservative test, epsilon correction, and multivariate 

analysis. The multivariate test was the only procedure for which the 

--. / 
empirical alpha error rate did not differ reliably f^om the nominal 

\ » 

alpha fpv any effect tested here. Our recommendation is that multi- 
variate procedures should be used for analysis of repeated measures 
designs when sample size permits. 
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A Simulation Comparison of Univariate and Multivariate Analyses 




a Multi-factdr Repeated Measures Design 1 
Dale E. Berger and Susan C. Selhorst, Claremont Graduate School 

In this paper we will review the ways in which repeated measures 

analyses are usually handled, describe limitations of these methods,^ 

n 

present the^results of an empirical comparison of four procedures, and 
/ make recommendations for the selection of a test. 
Assumptions for Univariate Analysis 

It is widely known that the conventional univariate test of signifi- 
cance for a within subjects factor may be positively biased when the data 
violate the "symmetry" assumption. This assumption is satisfied if a) the 
popi^ation variances within each treatment level on fhe repeated factor 
a 5 e hom 5>8 ene °us, and b) the covariances between these treatment levels are 
homogeneous. Together, these assumptions are also called "symnetry of 
covariance matrices" (Kirk, 1968), or "compound synmetry" (Scheffe, 1959). 
In addition, if there is a between-groups factor (or factors), it is 
necessary to assume that the covariance matrices for the repeated measures 
ar§ identical for all groups. 

It is less well known that the symmetry assumption specifies sufficient 
but not necessary conditions for the conventional univariate tests of the 
repeated factor to be valid. The actual necessary and sufficient condition 
is that any set of k-1 orthogonal normalized (orthonormal) contrasts on the 
repeated factor with k levels should generate a covariance matrix that has 
a "sphericity" pattern (i.e., equal variances and zero covariances) (Huynh 
& Feldt, 1970). This pattern is also called "circularity." 
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In practice, researchers seldom examine repeated measures data for 
violations of assumptions. Jennings and Wood (1976) tabulated the use of 
repeated measures analysis in the 1975 volume of Psychophysiology . They 
found that Of 56 articles with repeated measures analysis. 47 ignored the 
sphericity (or symmetry) assumption; and 34 reported at least one F ratio 
that would not have been significant if the conservative Geisser-Greenhouse 
. test had been used. The severity of violation of sphericity can be very 
_ substantial, as we discove^d in our own longitudinal data from a develop- 
mental study. On reflection, it seems obvious that departures from 
sphericity should be expected with longitudinal data since covariances 
are naturally larger between measures close together in time than between 
measures taken farther apart. 

The degree of sphericity of a covariance matrix of k levels can be 
indexed by a coefficient epsilon, which varies from f-1 for perfect 
. sphericity, to a lower limit of l/(k-l) (Box, 1954). Simulation studies 
have shown that the standard unadjusted univariate tests are usually 
reasonably accurate when epsilon is greater than 0.7 (e.g., Rogan, 
Keselman, & Mendoza, 1979). Box (1950, 1954) developed methods for 
testing whether epsilon differs from 1.0, but there are at least two 
reasons why these tests are not likely to be useful: (1) the teats are 
not very powerful for email samples, which Q is where the bias in the uni- 
variate test is greatest; and (2) the test of epsilon is about as complex 
as using it to adjust the degrees of freedom, so one might just as well 
make the epsilon adjustment and bypass the test of significance. 
Analysis of Repeated Measures 

Several approaches have been suggested to avoid positive bias in 
repeated measures designs. In some applications, it may be possible to 
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avoid heterogeneity among correlations between treatment levels by randomizing" 
or counterbalancing the order df presentation. However, this procedure will 
not remove correlations between similar treatment conditions, and it is not 
even possible in longitudinal studies. \ \ 

The sphericity assumption can be avoided altogether with, nonparametric 
tests, such as the Friedman two-way 'fcNOVA." Nonparametric tests are' not 
attractive substitutes because they test different hypotheses, and they do 
not make efficient use of data. 

Perhaps the simplest procedure is the Geisser-Greenhouse conservative 
test wherein the degrees of freedom for the F test are multiplied by the 
smallest possible value of epsilon, l/(l-k). ; Although this procedure has 
received wide endorsement, and it certainly does have the advantage of 
simplicity, its disadvantage is that it is much too conservative when the 
sphericity assumption is approximately satisfied. Thus, routine application 
of the Geisser-Greenhouse conservative adjustment is inappropriate. 

A more accurate procedure ia to multiply the degrees of freedom for the 
F test by an estimate of the population value of epsilon based on the sample 
variance-covarlonce matrix for the repeated factor (Box, 1954). This pro- 
cedure is quite accurate^ although it does not have good reliability when 
the number of observations per group is less than 15 (Collier, Baker, Mande- 
ville, & Hayes, 1967). Probably the main reason the epsilon correction has 
not been used more is that computation of epsilon is not easily done by hand, 
and lit has not been provided by popular statistical computer packages. 

A three-step approach to testing significance of an F ratio for a 
repeated measures factor was first proposed by Greenhouse and Geisser (1959), 
and this approach has been endorsed by some standard textbooks on analysis 
of variance (f.g., Kirk, 1968). If the F ratio is not significant with the 
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unadjusted degrees of freedom, the procedure is stopped and the null 
hypothesis Is not rejected, since the null hypothesis would not be re- 
jected with reduced degrees of freedom, if the F ratio Is significant 
both with unadjusted degrees of freedom and with the conservative Geisser- 
Greenhouse adjustment by the smallest possible value of epsilon, then the 
null hypothesis can be rejected without further testing. If the F ratio 
is significant with unadjusted degrees of freedom, but not significant with 
the conservative test, then epsilon should be estimated from the data and 
used to adjust the degrees of freedom. 

The final approach to be discussed here is multivariate analysis. ^In 
this approach, the k measures from a given individual are recast into a set 
of (k-1) contrasts. These contrasts are used as multiple dependent measures 
for each individual to test the 'multivariate null hypothesis that the mean 
of each contrast in the set is zero. No assumptions need to be made about 
the form of the variance-covariance matrix, although the form is assumed to 
be the same for all treatment groups. The method provides an exact test, 
even for complex designs. A limitation is that the multivariate test is 
less powerful than the univariate tests when the sphericity assumption is 
satisfied, especially when the sample size is small. In fact, the multi- 
variate test cannot be calculated at all if the sample size does not exceed 
r 

the number of measurements plus the number of ( treatment groups by at least 
one. Else, the pooled within subjects variance-covariance matrix does not 
have an inverse, and multivariate computations are not possible. 

It should be noted 'that it is not appropriate to make unqualified 
statements about the relative power of the multivariate and univariate 
approaches, since the null hypotheses are not the same. For different sets 
of data, the multivariate test may be more powerful than the unadjusted 
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univariate test, or less powerful than the conservative Geisser-Greenhouse 
test (Romaniuk, Levin, & Hubert, 1977). It is not the case that the multi- 
variate test will always produce a probability value between the extremes 
of these two univariate test procedures. 
Method 



Relatively little empirical work has been done with complex designs 
with two or more factors with repeated measures. In the current study, we 
used computer simulation to examine the bias of alternate test procedures 
with data similar co that which might reasonably be observed. We assumed a 
research situation in which measures are taken under three conditions (C 
factor) at each of three different times (T factor), with sample N - 30. 
We constructed two data structures, with .small and large departures from 
sphericity. Correlations were constructed to vary inversely with separation 
in time, and were higher between conditions 1 and 2 than between 1 and 3 or 
2 and 3. The epsilon values for the factors ranged from .96 to .64, and 
•for the interactions were .71 and .54. The epsilon values for the inter- 
actions were computed as products of the epsilons for the corresponding 
fafitors (McHugh, Sivanich, & Geisser, 1961). 

A total of 1000 computer simulations were done on each data! set, using 
four different methods: 

1) unadjusted univariate; 

2) Geisser-Greenhouse conservative test; 

3) epsilon correction using sample values of epsilon; and 

4) multivariate analysis, using MANOVA. 

The actual population means were all equal, so that an unbiased test would 
produce (false) "significant" results at the rate set by alpha. 
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Results and Conclusions 

* The results are summarized in Table 1, and can be described as follows: 

1) The unadjusted univariate test was too liberal, especially for 
epsilon values below .7. 

2) The Ceisser-Greenhouse test was too conservative, especially for 
epsildn larger than .7. 

3) The epsi-lon 'correction was reasonably unbiased except for very small 
values, of epsilon. The tabled values are for rounded degrees of freedom; 
when degrees of freedom were truncated, the test procedure became much too 
conservative. 

4) The multivariate test was accurate throughout. 

It should be noted that about a- third of the tests that were signifi- 
cant with the multivariate test were not significant with the unadjusted 
univariate test. We interpret this to mean that one does not have to in- 
struct highly artificial data to find a case where the multivariate pro- 
cedure is more sensitive than the most powerful univariate test. 

If freeddm from bias is aesired, neither the unadjusted univariate 
test nor the Geisser-Greenhouse conservative test is appropriate. The 
former can be much too liberal, while the latter is generally much too 
conservative. The epsilon adjustment is a great improvement, but it may 
be too liberal for very small values of epsilon. The multivariate pro- 
cedure did not depart from the nominal alpha at any level of epsilon tested, 
here. 

With the recent addition of MANOVA to SPSS, access to the multivariate ' 
test should no longer be an obstacle. Our recommendation is that multi- 
variate procedures routinely 1>e used for analysis of repeated measures 
designs when sample size permits. 
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Footnote t) 



hhe computer simulation was conducted by the second author as part . 
of a master's thesis" under the supervision of the first author. Portions 
of this paper were presented at the meeting of the Western Psychological 
Association in Los Angeles, 1981. \* 
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table 1 



Alpha: 
Epsilon* 
.960 
.8^5 
.736* 
.707 
.636 
.537 



« 

Empirical Type I Error Rat.es for Each of Four Test Froeedures* 



Unadjusted 
Univariate 



Conservative 
Adjustment 



Epsilon 
Adjustment 



.05 

.049 
j*053 
.063 
.074** 
. 082** 
.107** 



.01 

-v607 
.017* 
.017* 
.028** 
.028** 
.055** 



.05 

.014** 
.029** 
. 032** 
. 014** 
. 048 * 
.035* 



, .01 

.003* 
.004 
.004 
. 000** 
.004 
.005 



.05 

.048 
.049 
.043 
.047 
.051 
. 067.* 



.01 

.007 
.015 
_ .004 
.012 
«o .009 
.026** 



MANOVA 



.05 

.044 
.047 
.045 
.049 
.055 
.055 



.01 

.010 
.008 
.007 
.013 
.006 
.013 



Each entry is based on 1000 replications; the design Is 3x3 repeated measures with 30 subjects. 

"for^aleri 'Sd^ST' 2T ^ <C) * " - TxC * res P ecti vely, are .736, .960, and .707 

ror Data bet 1, and .636, .845, and .537 for Data Set 2. 

♦Empirical probability outside 95Z confidence interval. 

♦♦Empirical probability outside 99X confidence interval. ~~ • 



